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Abstract 

Biopolymers are characterized by heterogeneous interactions, and usually perform their biologi¬ 
cal tasks forming contacts within domains of limited size. Combining polymer theory with a replica 
approach, we study the scaling properties of the probability of contact formation in random het¬ 
eropolymers as a function of their linear distance. It is found that close or above the theta-point, 
it is possible to define a contact probability which is typical (i.e. ’’self-averaging”) for different 
realizations of the heterogeneous interactions, and which displays an exponential cut-off, depen¬ 
dent on temperature and on the interaction range. In many cases this cut-off is comparable with 
the typical sizes of domains in biopolymers. While it is well known that disorder causes interest¬ 
ing effects at low temperature, the behavior elucidated in the present study is an example of a 
non-trivial effect at high temperature. 
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In biopolymers, the formation of contacts between monomers in non-compact conforma¬ 
tions is one of the basic physical procceses which eventually determine the function of the 
molecule. For example, in the case of proteins, the formation of non-covalent interactions 
between distant amino acids in the denatured state is, in many cases, among the hrst steps 


in the folding process |l|. In fact, the folding rate of proteins from their denatured state has 
been shown to be correlated with the separation Al along the chain of the pairs of residues 


in minimal protein models j^. Similarly, in chromatin the contact probability between loci 
was found to correlate with their linear distance on the megabase scale 4, l5|. 

Moreover, biopolymers usually display a rather tight upper limit in the value of Al 
associated with their contacts. Most of them are structured in domains of characteristic size, 
and the formation of contacts takes place predominantly within such domains. For example, 
proteins can be very long, but the distribution of domain sizes drops above 250 residues , 
while longer proteins are usually built of multiple domains that fold independently and then 
assembly together. Also in the case of chromatin, the polymer seems to form domains with 
a maximum size of the order of 10^ — 10® bases 7(]. 

The simplest description we can give to the contact formation in biopolymer is through a 
homopolymeric model at equilibrium. In the elongated states of homopolymers, the contact 
probability between two monomers depends on Al (when this is sufficiently large) according 
to a power law where n = 3/2 for ideal chains, that is when the effective interaction 

between monomers is null, and k = 9/5 Sj for a random coil made of mutually repuslive 
monomers. As a matter of fact, the contact probability between the ends of unstructured 
peptides with repeated AGQ sequence of length up to 29 residues, measured by FRET, 
displays a power-law with respect to the le^th of the peptide whose exponent changes from 
1.55 in water to 1.7 in urea and guanidine |9|. Simulations of the unfolded state of globular, 
single-domain proteins up to lengths of 250 monomers show a power-law dependence of 
contact probabilities with exponent 2.0 lOj. Even the crystal structure of proteins seems 

remniscent of the associated denatured state and displays ideal-chain statistics 111, the 

□ 

distribution of loop sizes having a maximum at 27 residues [12| |. 

A homopolymeric model can account for the power-law relation between contact prob¬ 
ability and linear distance, but it cannot explain the presence of hnite-size domains. On 
the contrary, the long tail associated with power-laws would suggest that the probability 


which are in contact in the imtive conformation [^, and the same phenomenon was observed 
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of non-local contacts remains rather high even at large separation distances. In such a sce¬ 
nario, the evolutive advantage of shaping a biopolymer into domains to reduce the entropic 
cost of forming the initial contacts is weak or null. 

However, biomolecules are rarely homopolymers, and consequently it may be useful to 
investigate the effect of the heterogeneity of the interactions in the probability of contact 


formation. As a model, we will consider a random heteropolymer 
the potential 

N 


13 


14i\ interacting with 
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where “ A-i| — a) is any function that mantains the integrity of 

the polymer, a is the inter-monomer separation length, N the number of monomers, v is 
the interaction volume and Bij accounts for two-body interactions, assumed as stochastic 
quenched variables distributed according to a Gaussian of average Bq and standard deviation 

a. 

One would be interested in the contact probability 


jO - S d{r}5(ri -rj)exp[-/l[/({r})] 

“ Z “ /d{r}exp[-/l[/({r})] 
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between monomers i and j of the chain. But in disordered systems, relevant quantities 
should be averaged over disorder, and this is meaningful only is if their relative fluctuations 
become negligible when the system is large enough, namely if they are self-averaging 15 |. 
The standard Brout’s argument 1^ suggests that extensive quantities, like the free energy, 
are self-averaging. The argument says that in a system with a given realization of the 
disordered interactions, the relative fluctuations of extensive quantities go to zero in the 
thermodynamic limit thanks to the central limit theorem. If one divides this system in K 
weakly-interacting sub-systems, each of them can be regarded as a different realization of 
the disordered interaction in an identical, although smaller, system, and consequently the 
relative fluctuations of the extensive quantity over the disorder goes to zero as well. 

The quantity Fij = —T log Rij is a free energy, but it is difficult to apply Brout’s argument 
to it. In fact, when dividing the whole chain into weakly-interacting sub-systems, these will 
not be identical to each other, because one of them will contain the loop and the others not. 
Instead, a self-averaging quantity is expected to be Faz = —T{N — Al)~^ ^^logi?m,m+Az 
in the case that Al -C N. In fact, dividing the chain in K subsystems, Faz results the sum 
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of {N — Al){K — 1) terms accounting for the free energy of an unconstrained polymer, and 
{N — Al) terms accounting for the free energy of a looped polymer. The relative fluctuations 
of these sums, which can be regarded as averages over the disordered interactions, go to zero 
for N — Al ^ 1, suggesting that F^i is self-averaging. 

The average of F/^i over disorder can be evaluated with the standard replica trick [Ti 
from _ 

( 3 ) 




1 1 7^ 

lm,m+Al = ITT- VT > hm - log =-. 

N - Al ^ n^o n 7 ji 


The constrained partition function Z”- dehned by a contact i — j can be integrated over the 


Gaussian-distributed interaction elements Bij as in ref. 


■ H 


to give 
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where Ui = J2k 'Wo(kfc — 


0 + ^ '^kia di'T'k ~ ’"D Bq = Bq- /Scr^ is the effective 
one-replica interaction which controls the density of the chain [w|. A similar expression, 
lacking of the product of (5(r“ — r“) holds for the unconstrained Z^. 

For each pair of replicas, the double sum over monomers at exponential of Eq. (0]) counts 
the number of contacts shared by the two replicas. If the chain has density p, this number is 
expected to scale as N because each monomer has a probability p to be in contact with 
another monomer of the same replica, and a probability p to be in contact with the same 
monomer within the other replica. Thus, the number of shared contact results independent 
on N close to the 0-point, where p = po ^ and decreases above the 0-point, where 

p < Po- If n a^, for ’’biological temperatures” ~ a) the term which couples the 

replica together can be treated perturbatively, obtaining 


fc-ii 


Z” = j d{r«} b”n5(r"-r“)j 
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X exp 
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/3V2 


- rt)5{rl- 
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k^l 


At variance with the perturbation approach applied to the excluded-volume js], in which 
case the perturbing term scales as (T — 0 )A^^A and thus is meaningful only if T ~ 0, in the 
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present case it can be applied at any temperature. Following a scheme similar to that of ref. 
20| one can write the ratio needed in Eq. (|3]) as 


lim-logS 

n ->-0 n 


lim — log 

n^o n 







( 6 ) 


where the superscript (2) indicates the perturbed partition functions, including the term 
proportional to while the superscript (0) indicates the unperturbed partition function. 
The last fraction is imamterial because it does not depend on (j — i). 

The perturbed constrained partition function can be written as 


- ( / /■ \ ^ 

- n{n — 1) ( / dr5{ri — rj)5{rk — ) (7) 

{zfr fc? V ) 

where the integral is performed over a single replica. The above sum can be split in terms 
dehned by the order of the indexes k, I, i and j, which can be graphically represented as 
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k I i j k i I j ik Ij i j k I i k j I ki j I 
To calculate the integrals in the sums dehned by the above graphs, we employ the approx¬ 
imation that Ui ~ f/o, so that the propagators associated with the solid segments in the 
graphs is that of the ideal chain. 


Go(Ar,A/) 
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which is exact if i?Q = 0 and worsen as Sq > 0 increases. Vertexes are V(Ar) = 27r5(|Ar|). 
Each term of Eq. (jS]) can be calculated integrating the chain of propagators corresponding 
to the graph and approximating the sum over k and I as integrals. For example, the simplest 
contribution is 


^ [ d^rkd^rid^TiSrjffrisiGoir-k, k)Go(ri -rk,l - k)V{ri - r^) x 

k<i k I i j k<i ^ 

X Go{ri -riG- l)Go{rj - ri,j -i)V{rj -ri)Go{rN - rj,N -j)^ = 
n\t-2){f-2t + 2) 1 

87r2ai2(z -1)2 ^ ^ 

where is the volume of conformational space of the chain, needed because the functions 
Go are probability densities. In four of the six terms of Eq. ([8]), the leading contribution in 
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the limit of large {j — i) scales as [j — i) Exception is made for the third and the sixth 
term, which give 

1 


E. 


k<i i k Ij 



(j — i) 


— i + 1]Y {j — 
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These two terms give a perturbation to logi?jj which depends on (j — z), since the square 
of the unperturbed constrained partition function with which one must compare them (see 
Eqs. o o. and below) scale as (j —i)^ for the ideal chain and {j — i) for the swollen 
coil. 

Identifying the conformational-space volume hi with Eq. ([3]) can be calculated 

using Eqs. ([6]), ([7]) and flTT]) . and performing the limit n —)■ 0, resulting for large Al in 

2k-2‘ 

11 / /\/ \ 

log Rai = log • exp 




Air 


where 


Ain = 


Stt^ 


( 12 ) 


(13) 


This means that when one plots the contact probabilities of pairs of monomers versus their 
separation in log-log scale, the linear behaviour can be detected only for Al A/q, while an 
exponential drop dominates at larger values of Al. The exponent associated with the power- 


law regime does no 


suggested in ref. 


result to change with respect to the homopolymeric case, as already 
2l| making use of the renormalization group in three dimension. Examples 
of the exponential correction to the power law are displayed at different temperatures in Fig. 
[T]for the case of the ideal chain (upper panel) and the random coil (lower panel). 

When the temperature is close to the ^-point (i.e., Bq = 0), then k = 3/2 and the effect 
of the disorder in the interactions is that of applying an exponential cutoff to the contact 
probability to monomers whose linear distance is beyond A/q. To have an order of magnitude 
for A/q in the ideal-chain regime, one can consider the case Bq = a, so that B'^ = 0 implies 
= 1 (i.e., T = a), and v/a^ = 0.5. In this case, AIq ^ 10^. 

Above the theta point (i.e., > 0) k = 9/5 and the correction to the power-law 

behaviour is a stretched exponential with power 8/5, which is not very far from a Gaussian 
function. If, for example, we still set Bq = a and v/a^ = 0.5, but now choose T = so 
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that i?Q = l/3cr>0 and the chain is in a coil phase, now AIq ^ 130. The coil regime, above 
the theta point, is the typical case experienced by proteins at the beginning of the folding 
process in the experiments 22 |. 

The behaviour of AIq with respect to the temperature is quite irregular (see Fig. |2]). 
At the theta point it can be large because, in spite of the small denominator in Eq. (IT^ . 
the overall exponent is 1. When the temperature increases just above the theta point, the 
denominator becomes somewhat larger, but the overall exponent drops to 5/8, making AIq 
small. As temperature is further increased, AIq becomes larger and eventually diverges. A 
consequence of this is that, for each value of Bq, there is an intermediate range of tempera¬ 
tures which penalizes the formation of long-range structure in the elongated conformations 
of biopolymers. 

Actual biopolymers are hnite systems. The upper limit of their length is usually well- 
dehned, and long biopolymers are structured in domains compatible with this upper limit. 
In the case of proteins, single domains are shorter than ~ 250 residues Gj, corresponding 
to ~ 100 Kuhn lenghts. Among the factors which constrain the size of single domains 
could be the difficulty of establishing long-range contacts in the denatured state due to 
the exponential cutoff highlighted above, and consequently of achieving an efficient folding 
mechanism. Also in the case of chromatin, whose Kuhn length is 6 ■ 10^ bases Q, the 
polymer seems to form domains with a maximum size of the order of 10® — 10® bases [^, 
corresponding to hundreds of Kuhn lenghts. 

The low-temperture globular phases of random heteropolymers have been widely studied 
in the past 1^, 1^, and show glassy behaviour. Interestingly, the effect of disorder on the 
contact probability displayed by Eq. flT^ appears at rather high temperatures, well above 
the glassy transition. 
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FIG. 1: The shape of i?AZ calculated at T = 0.2a (blue curve), T = 0.5 cj (purple curve), T = a 
(green curve) and T = 1.5cj (red curve), for the cases Bq = 0 (upper panel) and B'q > 0 (lower 
panel). The black curves show the power law without exponential correction. 
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FIG. 2: The dependence of AIq on the temperature for the cases Bg = 0 (black curve) and B'q > 0 
(red curve). If one assumes, for instance, that Bq = a, then the transition between the two cases 
occurs at T = cr (marked by a dotted arrow). At lower temperatures the system is in a globular 
phase and the presente calculations do not apply. 
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